Space Efficient Linear Time Lempel-Ziv Factorization on Constant~Size~Alphabets

نویسندگان

  • Keisuke Goto
  • Hideo Bannai
چکیده

We present a new algorithm for computing the Lempel-Ziv Factorization (LZ77) of a given string of length N in linear time, that utilizes only N logN+O(1) bits of working space, i.e., a single integer array, for constant size integer alphabets. This greatly improves the previous best space requirement for linear time LZ77 factorization (Kärkkäinen et al. CPM 2013), which requires two integer arrays of length N . Computational experiments show that despite the added complexity of the algorithm, the speed of the algorithm is only around twice as slow as previous fastest linear time algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Lempel-Ziv Decompression in Linear Space

We consider the problem of decompressing the Lempel-Ziv 77 representation of a string S ∈ [σ] using a working space as close as possible to the size z of the input. The folklore solution for the problem runs in optimal O(n) time but requires random access to the whole decompressed text. A better solution is to convert LZ77 into a grammar of size O(z log(n/z)) and then stream S in optimal linear...

متن کامل

Small-space encoding LCE data structure with constant-time queries

The longest common extension (LCE) problem is to preprocess a given string w of length n so that the length of the longest common prefix between suffixes of w that start at any two given positions is answered quickly. In this paper, we present a data structure of O(zτ + n τ ) words of space which answers LCE queries in O(1) time and can be built in O(n log σ) time, where 1 ≤ τ ≤ √ n is a parame...

متن کامل

Constructing LZ78 Tries and Position Heaps in Linear Time for Large Alphabets

We present the first worst-case linear-time algorithm to compute the Lempel-Ziv 78 factorization of a given string over an integer alphabet. Our algorithm is based on nearest marked ancestor queries on the suffix tree of the given string. We also show that the same technique can be used to construct the position heap of a set of strings in worst-case linear time, when the set of strings is give...

متن کامل

Lempel Ziv Computation in Small Space (LZ-CISS)

For both the Lempel Ziv 77and 78-factorization we propose algorithms generating the respective factorization using (1 + ǫ)n lg n+O(n) bits (for any positive constant ǫ ≤ 1) working space (including the space for the output) for any text of size n over an integer alphabet in O ( n/ǫ )

متن کامل

Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Computing the LZ factorization (or LZ77 parsing) of a string is a computational bottleneck in many diverse applications, including data compression, text indexing, and pattern discovery. We describe new linear time LZ factorization algorithms, some of which require only 2n log n + O(log n) bits of working space to factorize a string of length n. These are the most space efficient linear time al...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1310.1448  شماره 

صفحات  -

تاریخ انتشار 2013